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The  Standard  Error  of  Equiperc entile  Equating 
Abstract 

The  standard  error  of  an  equipercentile  equating  is  derived  for 
four  different  situations.  Some  numerical  results  are  checked  by  Monte 
Carlo  methods.  Numerical  standard  errors  are  computed  for  two  sets  of 


The  Standard  Error  of  Equipercentile  Equating* 


It  is  frequently  desired  to  use  scores  on  two  or  more  forms  of 
the  same  test  interchangeably.  If  the  test  forms  differ  in  difficulty  or 
in  other  ways,  some  transformation  of  the  raw  test  scores  must  be  made 
to  adjust  for  these  differences.  Transformations  that  (attempt  to)  make 
scores  on  different  forms  interchangeable  are  called  equatings.  In 
equipercentile  equating,  these  transformations  are  determined  by  the 
requirement  that  for  some  specified  population,  the  cumulative  frequency 
distribution  of  the  transformed  scores  shall  (theoretically)  be  the  same 
regardless  of  the  test  form  administered. 

In  practice,  some  empirical  procedure  is  used  to  implement  this 
theoretical  definition  for  an  actual  sample  of  examinees.  Sampling 
fluctuation  in  the  resulting  empirical  equating  is  the  subject  of 
concern  here. 

Consider  the  following  empirical  large-sample  equipercentile 
equating  procedure: 

1.  Administer  tests  X  and  Y  to  N  examinees.  Score  both 
tests. 

2.  Given  any  fixed  xq  ,  find  the  score  y'  that  has  the  same 
sample  cumulative  frequency. 

3.  Assert  that  the  scores  (xQ,y')  are  asymptotically  equated. 

(This  procedure  is  slightly  biased  since  Nq  observations  lie  below 
xq  and  only  Nq  -  1  observations  lie  below  y'  .  We  ignore  this, 
since  it  will  not  affect  the  asymptotic  variance.) 

*This  work  was  supported  in  part  by  contract  N00014-80-C-0402 , 
project  designation  NR150-453  between  the  Office  of  Naval  Research  and 
Educational  Testing  Service.  Reproduction  in  whole  or  in  part  is 
permitted  for  any  purpose  of  the  United  States  Government. 


We  must  now  find  the  asymptotic  sampling  variance  of  this  y' 

for  fixed  x  .  We  will  consider  first  a  (unrealistic)  case  where 
o 

the  test  score  is  a  continuous  variable,  then  a  more  usual  case  where 
the  test  score  is  nonnegative  integer. 

If  tests  X  and  Y  are  given  to  the  same  examinees,  there 
is  likely  to  be  a  practice  effect  or  a  fatigue  effect  on  the  second 
test  administered.  To  avoid  this,  it  is  common  to  give  tests  X  and 
Y  to  different  random  samples  from  the  same  population  of  examinees. 
We  consider  this  case  first. 


1.  Continuous  Case,  Two  Groups 


Let  F(x)  and  G(y)  denote  the  cumulative  frequency  distribution 

of  score  x  and  score  y  in  the  population.  We  administer  test  X 

to  a  sample  of  examinees  and  find  that  in  the  sample  a  proportion 

q  of  these  fall  below  the  chosen  fixed  value  xq  .  Having  administered 

test  Y  to  a  sample  of  Ng  examinees  from  the  same  population,  we 

denote  the  q  -th  order  statistic  in  this  sample  by  y*  ar.d  assert 

that  y '  is  equivalent  (equated)  to  xq  .  We  wish  to  find  the 

asymptotic  sampling  variance  of  y*  (it  is  always  to  be  understood 

that  x  is  fixed), 
o 

For  fixed  q  ,  y*  is  asymptotically  normally  distributed  with 
mean  y  , i  determined  by  the  relation 

y  |q 
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G<Wy,jq)-q  (1) 

and  variance 

<jJ,jq  -  pq/N2(g(py,  jq))2  ,  (2) 

where  p  *  1  -  q  and  g(y)  is  the  probability  density  at  y 
(Kendall  &  Stuart,  1969,  Sections  14.11-14.12).  When  q  is  random, 
a  well-known  identity  gives 

Var  y*  =  Var(p  ,i  )  +  S(o2,i  )  .  (3) 

y  |q  y  |q 

From  (1) 

d_  1 

dq  VU  "  g<Uy.  jq) 

By  the  delta  method,  we  find 

Var(V  |q}  "  PQ/Ni8o 

where  Q  is  defined  by 

Q  =  F(xq)  ,  (5) 

P  ;  1  -  Q  ,  gQ  *  g(yQ)  ,  and  yq  is  defined  by 
G(yo)  =  Q  . 


(6) 


a  5  -  — ij  (P  -  Var  p  -  P2)  *  (7) 

N2*o  N2*o  N2*o 

where  g  i  g(^  , i  )  .  Substituting  (4)  and  (7)  into  (3)  we  have 

y  |q 

finally 


Var  v*  *  ^  (  —  +  — 
var  y  2  '  N,  +  N„  ) 


Discrete  Case,  TVo  Groups 


Consider  next  che  case  where  scores  x  and  y  are  nonnegative 
integers.  For  convenience,  we  will  always  pick  xq  to  be  an  integer 
plus  0.5.  Let  F(x)  and  G(y)  be  distribution  functions  continuous 
to  the  right  of  each  integer  and  let  yQ  be  the  integer  defined  by  the 
relation  G(yQ  -  1)  <  Q  =  F(xq)  1  G(y  ).  wil1  ordinarily  be 
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asymptotically  infinitely  unlikely  that  G(yQ)  ■  Q  ;  for  simplicity , 
we  will  assume  hereafter  that  G(yQ)  >  Q  . 

If  we  now  define  y*  as  in  the  preceding  section,  we  are  sure 
that  integers  y*  and  yQ  will  be  equal  for  sufficiently  large  N  . 
This  means  to  the  usual  order  of  approximation  that  y*  will  have  an 
asymptotic  variance  of  zero.  Let  us  use  linear  interpolation  to 
define  a  value  y"  as  follows: 


(9) 


where  G~  is  the  observed  proportion  of  Y  scores  below  the  integer 
y'  and  g  is  the  observed  proportion  of  scores  at  y'  .  We  will  in 
the  discrete  case  assert  that  y"  is  equated  to  xq  .  It  is  the  asymptotic 
variance  of  y"  that  is  now  required. 

As  noted  above,  the  variance  of  y’  is  asymptotically  zero.  The 
proportions  q  ,  g  ,  and  G  are  esymptotically  normally  distributed 
with  known  variances  and  covariances: 


a?  -  g  (1  -  g)/N,  ,  a?  -  G~(l  -  G~)/N  , 

g  o  o  l  g-  l 

a  .  ■  0  ,  o  ,  -  -  g  G  /N. 

qG'  go’  °  2 


Now 


dv"  .  da.  _  del 

'  A  A 

g  g 


i_  r  >S  +  Z2  . 

So1  "2 


(Go  -  Q)  (Q  -  G  ) 
-  G'> 


where  G  =  Gq  -  gQ  .  Note  that  the  last  fraction  in  (10)  approaches 
aero  as  gQ  -*■  0  .  Thus  as  gQ  becomes  small,  (10)  approaches  (8),  the 
variance  for  the  continuous  case. 


Discrete  Case.  One  Grout 


If  there  were  no  practice  or  fatigue  effect,  it  would  be  more 
efficient  to  administer  both  tests  to  the  same  students.  In  order  to 
see  how  much  difference  this  would  make,  we  derive  the  sampling 
variance  of  (9)  for  this  case. 

Let  a,  b,  c,  d,  k,  m  be  sample  frequencies  as 
defined  by  the  accompanying  x  <  xq  x  >  xq 

diagram  and  let  e.  ,  3  ,  y  >  yQ  b  a 

y  ,  5  ,  <  ,  and  p  be  y  ■  yQ  k  m 

the  corresponding  population  y  <  yQ  d  c 
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proportions.  In  the  present  notation  q  ;  b  +  k  +  d  ,  C*  *  c  +  d  , 
and  g  t  k  +  m  ,  so  (9)  becomes 


y*  -  0.5  + 


b  +  k  -  c 
k  4-  m 


(ID 


The  sample  frequencies  are  again  asymptotically  multivariate 

2 

normal  with  the  familiar  variances  and  covariances:  *  $ (1  -  fl)/N 

o.  ■  -Sy/N  ,  and  so  forth, 
be 

As  before. 


„  .  d£_±jic  +  m  -  b  +,  c  dk  .  ilt 

k  +  "  (k  +  m)2  (k  +  m)2 


Using  the  delta  method,  we  finally  obtain  after  some  algebra 


Var  y"  «  ~  [ptc  +  (6  +  y)g  +(B  -  Y)2] 
N*o 


(12) 


where  gQ  =  u  +  k  . 

If  x  and  y  are  independently  distributed,  (12)  becomes  the 
same  as  (10)  with  ■  N  . 
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4,  Continuous  Case,  One  Group 


When  x  and  y  are  x  <  x  x  >  x 

continuous,  we  deal  with  sample  y  >  y'  b  a 

frequencies  a,b,c,d,  y  <  y'  d  c 

as  defined  in  the  accompanying  diagram.  The  corresponding  population 
proportions  are  denoted  by  a  ,  8  ,  y  ,  <5  ;  for  example  6  =  4>(xo>y') 

where  4>(x,y)  is  the  cumulative  distribution  function  of  x  and  y  . 

Since  y'  is  to  be  the  q  -th  order  statistic,  as  in  section  1, 


where  q  =  b  +  d  ,  it  follows  that  given  x 


y'  must  always  be 


chosen  so  that  b  =  c  .  For  given  xq  ,  then,  the  frequency  distribu¬ 
tion  of  y'  is  proportional  to  the  probability  cf  finding  one  person 
at  y'  ,  c  persons  with  x  <  xq  and  y  >  y'  ,  and  c  persons  with 

i 

x  >  xq  and  y  <  y'  .  Writing  M  =  N  -  1  ,  this  probability  is  proportional 
to  g(y')  times  the  sum,  over  all  possible  values  of  c  ,  of  the  multi¬ 
normal  probability 


Mi  8CBC(a  +  <S)M  2C 
c !  c !  (M  -  2c)  ! 


(13) 


Using  Stirling's  approximation  to  the  factorials,  the  distribution 


of  y'  for  M  even  is 


(14) 


m/2  ^  eVa  -  b  -  y)”"20 
8<y,)  do  2*2<!+l(M  -  2c)M-2^ 

Taking  logs  under  the  summation  sign,  we  have  that  the  asymptotic 
distribution  of  y*  is  proportional  to 

M/2 

E  exp (log  A) 
c*o 

where 

log  A  =  log  g(y')  -  log (1  -  2C)  -  log  2tt  -  log  M  -  log  C 

+  M[C  log  By  +  (1  -  2C)  log(l  -  B  -  Y)  -  2C  log  C 

-  (1  -  2C)  log(l  -  201 

where  C  =  c/N  . 

Expand  A  in  powers  of  y  =  y*  -  yQ  and  c  H  C  -  yq  •  Using 

a  zero  subscript  to  denote  quantities  evaluated  at  yQ  (note  that 

B  =  Y  »  but  that  dg/dy  ^  dv/dy  )  and  dropping  terms  of  lower  order 
o  o  o  o 

than  M  ,  we  find 

lo*  A  S  -  2^  t  i  <C2Y;  -  6o)2  +  g2(l  -  2To)  y2 

-  2(2Y;  -  g0)  yc  +  2c2]  (15) 

where  vf  5  dv  /dy 
'  o  1  o  J  o 
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As  in  Feller  (1950,  Section  VII. 2),  we  see  from  (15)  that  c  and 
y*  are  asymptotically  bivariate  normal.  Writing  X  =  1  -  2y0  ,  and  h  5 
2y^  -  gQ  ,  we  see  from  (15)  that 
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When  the  correlation  between  x  and  y  is  2ero,  (20)  is  the  same  as 
(8)  with  H  «  N2  . 

5.  Numerical  Results 

Formulas  (8)  and  (20)  for  the  continuous  case  are  simpler  than 
formulas  (10)  ar.d  (12)  for  the  discrete  case.  We  will  give 
first  some  numerical  results  from  formulas  (8)  and  (20). 

A  Mortte  Carlo  .study  was  carried  out  by  drawing  N  ™  1000  pairs  of 
pseudo-random  standardized  normal  bivariate  deviates  (x,y)  from  a  popula¬ 
tion  with  a  correlation  of  p  ■  .90  (this  is  a  typical  correlation  between 
parallel  test  forms) .  From  this  sample  of  1000  cases  the  equated  value 
of  y'  was  found  separately  for  xq  *  0,  0.5,  1.0,  1.5,  2.0,  2.5  . 

The  foregoing  was  repeated  1000  times  with  independently  drawn  bivariate 
samples.  For  each  given  xq  ,  the  empirical  standard  deviation  s^,  was 
computed. 

The  resulting  standard  errors  (not  variances)  ere  presented 
in  the  fourth  column  of  Table  1.  The  corresponding  theoretical  values 
from  (20)  are  shown  in  the  third  column.  There  is  excellent  agreement 
between  theoretical  and  Monte  Carlo  results. 

The  second  column  of  Table  1  shows  the  standard  error,  according 
to  (8),  when  tests  X  and  Y  are  administered  to  different  examinees 
rather  than  to  the  same  examinees.  We  see  that  this  does  not  entail 
as  serious  a  loss  of  equating  accuracy  as  might  have  been  feared.  In 
view  of  the  likelihc  ,u  of  practice  and  fatigue  effects,  it  seems  that  the 
methods  of  Sections  1  and  2  should  be  used  whenever  possible,  rather 
than  the  methods  of  Sections  3  and  4. 
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Table  1 

Standard  Errors  of  Equip ercen tile  Equating  for 
Normally  Distributed  Variables 


Standard  Error 


Eq.  (8)  [or  (20),  p  -0]  p  -  .90 

xy  xy 


X 

o 

Nx  -  N2  -  1000 

Eq.  (20) 

Monte  Carlo 

0 

•  P56 

.030 

.029 

.5 

.059 

.032 

.032 

1.0 

.068 

.038 

.037 

1.5 

.086 

.052 

.053 

2.0 

.124 

.080 

.079 

2.5 

.200 

.138 

.137 
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For  Illustrative  purposes,  Table  2  shows  the  standard  errors  of 
an  equating  of  a  50-item  M  (Metropolitan  Achievement  Test)  Word 
Analysis  test  to  a  40-item  C  (Comprehensive  Test  of  Basic  Skills) 

Reading  Vocabulary  test.  The  data  were  drawn  from  the  Anchor  Test 

Study  (Loret,  Seder,  Bianchini,  and  Vale,  1974)  in  which  both 

tests  were  administered  to  a  group  of  1406  sixth-grade  students.  The 

resulting  bivariate  distribution  of  number-right  scores  was  smoothed 

by  a  method  described  by  Lord  (1980,  Section  17.4).  The  correlation  between 

M  and  C  was  .88.  The  tabled  values  were  computed  from  (12),  ignoring 

the  smoothing. 

The  standard  deviation  of  number-right  M  scores  for  this  group  of 
sixth  graders  is  11.5.  The  standard  error  of  measurement  for  M  scores 
is  2.7.  The  standard  error  of  equating  is  much  smaller  than  the 
standard  error  of  measurement. 

Table  3  provides  an  empirical  comparison  between  equipereentile 
equating  and  conventional  linear  equating.  In  this  case,  Form  VSA4  of 
the  90-item  SAT  Verbal  test  had  b^en  administered  to  2665  students,  along 
with  an  'anchor  test'  of  40  verbal  items.  At  a  later  time,  a  new,  85- 
item  Verbal  form,  XSA2,  was  fimilarly  administered  along  with  the  same 
anchor  test  to  a  new  group  of  2686  students.  As  part  of  normal  scoring  and 
reporting,  Form  XSA2  raw  ('formula')  scores  were  equated  by  a  standard 
linear  method  due  to  L.  R  Tucker  (Angoff,  1971,  Equating  Design  IV. A.) 
to  the  scaled  scores  on  Form  VSA4.  This  equating  is  shown  along  with  its 
standard  error  as  determined  by  the  computer  program  AUTEST  (Lord,  1975), 


in  the  first  three  columns  of  Table  3. 
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Table  2 

Standard  Error  of  Equipercentile  Equating,  Number-Right  Scores, 

MAT  to  CTBS 


c 

Scores 

Cumulative 

Frequency 

Distribution 

Equated 

M 

Scores 

Standard 
Error  of 
Equating 

37.5 

.98 

49.3 

.17 

32.5 

.89 

45.8 

.20 

27.5 

.76 

42.1 

.22 

22.5 

.61 

37.6 

.26 

17.5 

.44 

31.9 

.32 

12.5 

.26 

23.1 

.44 

7.5 

.08 

13.8 

.36 

2.5 

.01 

8.8 

.44 
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Table  3 

Comparison  of  Linear  and  Equipercentlle  Equating  for  the  Verbal  Score  on 
Form  XSA2,  College  Board  Scholastic  Aptitude  Test 


Linear  (Tucker)  Model  Equipercentlle  Method 


XSA2 

formula 

score 

Equivalent 

scaled 

score 

Standard 

error 

Equivalent 

scaled 

score 

Standard 

error 

78.1 

738 

4.07 

774 

13.47 

685 

3.46 

722 

15.85 

64.75 

644 

3.00 

652 

10.32 

58.9 

602 

2.56 

602 

4.97 

52.9 

559 

2.15 

558 

4.12 

47.25 

519 

1.82 

514 

3.47 

40.1 

469 

1.54 

466 

3.44 

32.4 

414 

1.51 

417 

2.93 

25.75 

367 

1.72 

364 

3.37 

16.1 

298 

2.28 

314 

4.07 

7.6 

238 

2.90 

242 

5.70 

-3.75 

157 

3.80 

195 

7.85 
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The  equipercentile  equating  of  XSA2  to  VSA4  was  effected  by  equating 
each  test  to  the  anchor  test  independently  and  then  using  the  rule  that 
scores  equated  to  the  same  anchor-test  score  are  equated  to  each  other. 

Thus  the  equipercentile  equating  of  XSA2  to  VSA4  requires  two  independent 
equatings  of  the  type  treated  in  Section  3  of  this  paper.  The  sampling 
variances  (12)  of  the  two  equatings  are  additive,  since  they  are  computed 
from  two  different  samples  of  students.  The  resulting  equipercentile 
equating  and  standard  error  ara  shown  in  the  last  two  columns  of  the 
table. 

To  put  these  standard  errors  into  perspective,  note  that  the 
standard  deviation  of  scaled  scores  in  a  group  of  students  is  typically 
about  100.  The  standard  error  of  measurement  of  a  scaled  score  is  about 
30  to  33.  The  standard  errors  of  equating  are  mostly  small  by  comparison. 
The  standard  errors  of  the  equipercentile  equating  are  double  those  of  the 
linear  equating  in  the  middle  of  the  score  range,  comparatively  larger  at 
the  extremes. 

Equipercentile  equating  can  be  improved  by  smoothing  the  empirical 
frequency  distribution  of  scores  before  equating.  This  reduces  the 
sampling  errors  but  may  introduce  small  biases  that  do  not  disappear  even 
in  very  large  samples.  The  sampling  error  of  a  smoothed  equating  could 
perhaps  be  determined  for  a  specified  smoothing  method,  but  the  mathematics 
would  be  burdensome. 
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